AITopics | textual representation

Collaborating Authors

textual representation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

0df1738319f8c6e15b58cb16ea3cfa57-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 20:14:55 GMT

arxiv preprint arxiv, language model, representation, (15 more...)

Neural Information Processing Systems

Country:

Africa > Middle East > Morocco (0.04)
South America > Colombia (0.04)
North America > United States > Utah (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping

Zhan, Xuhui, Derr, Tyler

arXiv.org Artificial IntelligenceAug-19-2025

Traditional multimodal learning approaches require expensive alignment pre-training to bridge vision and language modalities, typically projecting visual features into discrete text token spaces. We challenge both fundamental assumptions underlying this paradigm by proposing Inverse-LLaVA, a novel approach that eliminates alignment pre-training entirely while inverting the conventional mapping direction. Rather than projecting visual features to text space, our method maps text embeddings into continuous visual representation space and performs fusion within transformer intermediate layers. Through selective additive components in attention mechanisms, we enable dynamic integration of visual and textual representations without requiring massive image-text alignment datasets. Comprehensive experiments across nine multimodal benchmarks demonstrate nuanced performance trade-offs: Inverse-LLaVA achieves notable improvements on reasoning-intensive and cognitive tasks (MM-VET: +0.2%, VizWiz: +1.8%, ScienceQA: +0.2%, cognitive reasoning: +27.2%), while showing expected decreases in perception tasks requiring memorized visual-text associations (celebrity recognition: -49.5%, OCR: -21.3%). These results provide the first empirical evidence that alignment pre-training is not necessary for effective multimodal learning, particularly for complex reasoning tasks. Our work establishes the feasibility of a new paradigm that reduces computational requirements by 45%, challenges conventional wisdom about modality fusion, and opens new research directions for efficient multimodal architectures that preserve modality-specific characteristics. Our project website with code and additional resources is available at https://inverse-llava.github.io.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.12466

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Using Phonemes in cascaded S2S translation pipeline

Pilz, Rene, Schneider, Johannes

arXiv.org Artificial IntelligenceApr-24-2025

This paper explores the idea of using phonemes as a textual representation within a conventional multilingual simultaneous speech - to - speech translation pipeline, as opposed to the traditional reliance on text - based language representations. To investigate this, we trained an open - source sequence - to - sequence model on the WMT17 dataset in two formats: one using standard textual representation and the other employing phonemic representation. The performance o f both approaches was assessed using the BLEU metric. Our findings shows that the phonemic approach provides comparable quality but offers several advantages, including lower resource requirements or better suitability for low - resource languages.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2504.16234

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models

Zhu, Xi, Xue, Haochen, Zhao, Ziwei, Xu, Wujiang, Huang, Jingyuan, Guo, Minghao, Wang, Qifan, Zhou, Kaixiong, Zhang, Yongfeng

arXiv.org Artificial IntelligenceMar-5-2025

Text-Attributed Graphs (TAGs), where each node is associated with text descriptions, are ubiquitous in real-world scenarios. They typically exhibit distinctive structure and domain-specific knowledge, motivating the development of a Graph Foundation Model (GFM) that generalizes across diverse graphs and tasks. Despite large efforts to integrate Large Language Models (LLMs) and Graph Neural Networks (GNNs) for TAGs, existing approaches suffer from decoupled architectures with two-stage alignment, limiting their synergistic potential. Even worse, existing methods assign out-of-vocabulary (OOV) tokens to graph nodes, leading to graph-specific semantics, token explosion, and incompatibility with task-oriented prompt templates, which hinders cross-graph and cross-task transferability. To address these challenges, we propose PromptGFM, a versatile GFM for TAGs grounded in graph vocabulary learning. PromptGFM comprises two key components: (1) Graph Understanding Module, which explicitly prompts LLMs to replicate the finest GNN workflow within the text space, facilitating seamless GNN-LLM integration and elegant graph-text alignment; (2) Graph Inference Module, which establishes a language-based graph vocabulary ensuring expressiveness, transferability, and scalability, enabling readable instructions for LLM fine-tuning. Extensive experiments demonstrate our superiority and transferability across diverse graphs and tasks. The code is available at this: https://github.com/agiresearch/PromptGFM.

gnn, node, representation, (16 more...)

arXiv.org Artificial Intelligence

2503.03313

Country:

Europe > Austria > Vienna (0.15)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.04)
(19 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.70)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Graph2text or Graph2token: A Perspective of Large Language Models for Graph Learning

Yu, Shuo, Wang, Yingbo, Li, Ruolin, Liu, Guchun, Shen, Yanming, Ji, Shaoxiong, Li, Bowen, Han, Fengling, Zhang, Xiuzhen, Xia, Feng

arXiv.org Artificial IntelligenceJan-2-2025

Graphs are data structures used to represent irregular networks and are prevalent in numerous real-world applications. Previous methods directly model graph structures and achieve significant success. However, these methods encounter bottlenecks due to the inherent irregularity of graphs. An innovative solution is converting graphs into textual representations, thereby harnessing the powerful capabilities of Large Language Models (LLMs) to process and comprehend graphs. In this paper, we present a comprehensive review of methodologies for applying LLMs to graphs, termed LLM4graph. The core of LLM4graph lies in transforming graphs into texts for LLMs to understand and analyze. Thus, we propose a novel taxonomy of LLM4graph methods in the view of the transformation. Specifically, existing methods can be divided into two paradigms: Graph2text and Graph2token, which transform graphs into texts or tokens as the input of LLMs, respectively. We point out four challenges during the transformation to systematically present existing methods in a problem-oriented perspective. For practical concerns, we provide a guideline for researchers on selecting appropriate models and LLMs for different graphs and hardware constraints. We also identify five future research directions for LLM4graph.

graph, information, node, (14 more...)

arXiv.org Artificial Intelligence

2501.01124

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Liaoning Province > Dalian (0.04)
(5 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhancing In-Hospital Mortality Prediction Using Multi-Representational Learning with LLM-Generated Expert Summaries

Battula, Harshavardhan, Liu, Jiacheng, Srivastava, Jaideep

arXiv.org Artificial IntelligenceNov-25-2024

In-hospital mortality (IHM) prediction for ICU patients is critical for timely interventions and efficient resource allocation. While structured physiological data provides quantitative insights, clinical notes offer unstructured, context-rich narratives. This study integrates these modalities with Large Language Model (LLM)-generated expert summaries to improve IHM prediction accuracy. Using the MIMIC-III database, we analyzed time-series physiological data and clinical notes from the first 48 hours of ICU admission. Clinical notes were concatenated chronologically for each patient and transformed into expert summaries using Med42-v2 70B. A multi-representational learning framework was developed to integrate these data sources, leveraging LLMs to enhance textual data while mitigating direct reliance on LLM predictions, which can introduce challenges in uncertainty quantification and interpretability. The proposed model achieved an AUPRC of 0.6156 (+36.41%) and an AUROC of 0.8955 (+7.64%) compared to a time-series-only baseline. Expert summaries outperformed clinical notes or time-series data alone, demonstrating the value of LLM-generated knowledge. Performance gains were consistent across demographic groups, with notable improvements in underrepresented populations, underscoring the framework's equitable application potential. By integrating LLM-generated summaries with structured and unstructured data, the framework captures complementary patient information, significantly improving predictive performance. This approach showcases the potential of LLMs to augment critical care prediction models, emphasizing the need for domain-specific validation and advanced integration strategies for broader clinical adoption.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2411.16818

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

TEARS: Textual Representations for Scrutable Recommendations

Penaloza, Emiliano, Gouvert, Olivier, Wu, Haolun, Charlin, Laurent

arXiv.org Artificial IntelligenceOct-25-2024

Traditional recommender systems rely on high-dimensional (latent) embeddings for modeling user-item interactions, often resulting in opaque representations that lack interpretability. Moreover, these systems offer limited control to users over their recommendations. Inspired by recent work, we introduce TExtuAl Representations for Scrutable recommendations (TEARS) to address these challenges. Instead of representing a user's interests through a latent embedding, TEARS encodes them in natural text, providing transparency and allowing users to edit them. To do so, TEARS uses a modern LLM to generate user summaries based on user preferences. We find the summaries capture user preferences uniquely. Using these summaries, we take a hybrid approach where we use an optimal transport procedure to align the summaries' representation with the learned representation of a standard VAE for collaborative filtering. We find this approach can surpass the performance of three popular VAE models while providing user-controllable recommendations. We also analyze the controllability of TEARS through three simulated user tasks to evaluate the effectiveness of a user editing its summary.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.19302

Country:

North America > Canada > Quebec > Montreal (0.28)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > District of Columbia > Washington (0.05)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data

Ling, Xinyi, Peng, Bo, Du, Hanwen, Zhu, Zhihui, Ning, Xia

arXiv.org Artificial IntelligenceOct-22-2024

Leveraging multimodal data to drive breakthroughs in e-commerce applications through Multimodal Foundation Models (MFMs) is gaining increasing attention from the research community. However, there are significant challenges that hinder the optimal use of multimodal e-commerce data by foundation models: (1) the scarcity of large-scale, high-quality multimodal benchmark datasets; and (2) the lack of effective multimodal information integration methods. To address these challenges, in this paper, we introduce MMECInstruct, the first-ever, large-scale, and high-quality multimodal instruction dataset for e-commerce. We also develop CASLIE, a simple, lightweight, yet effective framework for integrating multimodal information for e-commerce. Leveraging MMECInstruct, we fine-tune a series of e-commerce MFMs within CASLIE, denoted as CASLIE models. Our comprehensive evaluation demonstrates that CASLIE models substantially outperform 5 categories of advanced baseline models in the in-domain evaluation. Moreover, CASLIE models show strong generalizability to out-of-domain settings. MMECInstruct and CASLIE models are publicly accessible through https://ninglab.github.io/CASLIE/.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2410.17337

Country:

North America > United States > Ohio (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Newfoundland and Labrador > Labrador (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > e-Commerce (1.00)
Information Technology > Data Science (1.00)
Information Technology > Information Management (0.93)
(3 more...)

Add feedback

Empowering Pre-Trained Language Models for Spatio-Temporal Forecasting via Decoupling Enhanced Discrete Reprogramming

Wang, Hao, Han, Jindong, Fan, Wei, Liu, Hao

arXiv.org Artificial IntelligenceAug-24-2024

Spatio-temporal time series forecasting plays a critical role in various real-world applications, such as transportation optimization, energy management, and climate analysis. The recent advancements in Pre-trained Language Models (PLMs) have inspired efforts to reprogram these models for time series forecasting tasks, by leveraging their superior reasoning and generalization capabilities. However, existing approaches fall short in handling complex spatial inter-series dependencies and intrinsic intra-series frequency components, limiting their spatio-temporal forecasting performance. Moreover, the linear mapping of continuous time series to a compressed subset vocabulary in reprogramming constrains the spatio-temporal semantic expressivity of PLMs and may lead to potential information bottleneck. To overcome the above limitations, we propose \textsc{RePST}, a tailored PLM reprogramming framework for spatio-temporal forecasting. The key insight of \textsc{RePST} is to decouple the spatio-temporal dynamics in the frequency domain, allowing better alignment with the PLM text space. Specifically, we first decouple spatio-temporal data in Fourier space and devise a structural diffusion operator to obtain temporal intrinsic and spatial diffusion signals, making the dynamics more comprehensible and predictable for PLMs. To avoid information bottleneck from a limited vocabulary, we further propose a discrete reprogramming strategy that selects relevant discrete textual information from an expanded vocabulary space in a differentiable manner. Extensive experiments on four real-world datasets show that our proposed approach significantly outperforms state-of-the-art spatio-temporal forecasting models, particularly in data-scarce scenarios.

arxiv preprint arxiv, forecasting, language model, (13 more...)

arXiv.org Artificial Intelligence

2408.14505

Country:

Asia > China > Beijing > Beijing (0.05)
North America > United States > Alabama (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.46)

Industry:

Transportation (0.93)
Energy > Power Industry (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Ontological Relations from Word Embeddings

d'Aquin, Mathieu, Nauer, Emmanuel

arXiv.org Artificial IntelligenceAug-1-2024

It has been reliably shown that the similarity of word embeddings obtained from popular neural models such as BERT approximates effectively a form of semantic similarity of the meaning of those words. It is therefore natural to wonder if those embeddings contain enough information to be able to connect those meanings through ontological relationships such as the one of subsumption. If so, large knowledge models could be built that are capable of semantically relating terms based on the information encapsulated in word embeddings produced by pre-trained models, with implications not only for ontologies (ontology matching, ontology evolution, etc.) but also on the ability to integrate ontological knowledge in neural models. In this paper, we test how embeddings produced by several pre-trained models can be used to predict relations existing between classes and properties of popular upper-level and general ontologies. We show that even a simple feed-forward architecture on top of those embeddings can achieve promising accuracies, with varying generalisation abilities depending on the input data. To achieve that, we produce a dataset that can be used to further enhance those models, opening new possibilities for applications integrating knowledge from web ontologies.

language model, ontology, relation, (15 more...)

arXiv.org Artificial Intelligence

2408.00444

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.50)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback